Named entities from Wikipedia for machine translation
نویسندگان
چکیده
In this paper we present our attempt to improve machine translation of named entities by using Wikipedia. We recognize named entities based on categories of English Wikipedia articles, extract their potential translations from corresponding Czech articles and incorporate them into a statistical machine translation system as translation options. Our results show a decrease of translation quality in terms of automatic metrics but positive results from human annotators. We conclude that this approach can lead to many errors in translation and therefore should always be combined with the standard statistical translation model and weighted appropriately.
منابع مشابه
بهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملDudley North visits North London: Learning When to Transliterate to Arabic
We report the results of our work on automating the transliteration decision of named entities for English to Arabic machine translation. We construct a classification-based framework to automate this decision, evaluate our classifier both in the limited news and the diverse Wikipedia domains, and achieve promising accuracy. Moreover, we demonstrate a reduction of translation error and an impro...
متن کاملAutomatic acquisition of Named Entities for Rule-Based Machine Translation∗
This paper proposes to enrich RBMT dictionaries with Named Entities (NEs) automatically acquired from Wikipedia. The method is applied to the Apertium English–Spanish system and its performance compared to that of Apertium with and without handtagged NEs. The system with automatic NEs outperforms the one without NEs, while results vary when compared to a system with handtagged NEs (results are ...
متن کاملUsing Wikipedia for Named-Entity Translation
In this paper we present a system for translating named-entities from Basque to English using Wikipedia’s knowledge. We can exploit interlingual links from Wikipedia (WIL) to get named-entity translation, but entities without interlingual links can be translated using the Wikipedia as a corpus, suggesting new interlingual links. In this second case the interlingual links can be used as a test c...
متن کاملGeneration of Bilingual Dictionaries using Structural Properties
Building bilingual dictionaries from Wikipedia has been extensively studied in the area of computation linguistics. These dictionaries play a crucial role in Natural Language Processing(NLP) applications like Cross-Lingual Information Retrieval, Machine Translation and Named Entity Recognition. To build these dictionaries, most of the existing approaches use information present in Wikipedia tit...
متن کامل